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ABSTRACT 


Analysis of log data generated by online educational sys- 
tems is an essential task to better the educational systems 
and increase our understanding of how students learn. In 
this study we investigate previously unseen data from Clio 
Online, the largest provider of digital learning content for 
primary schools in Denmark. We consider data for 14,810 
students with 3 million sessions in the period 2015-2017. 
We analyze student activity in periods of one week. By 
using non-negative matrix factorization techniques, we ob- 
tain soft clusterings, revealing dependencies among time of 
day, subject, activity type, activity complexity (measured 
by Bloom’s taxonomy), and performance. Furthermore, our 
method allows for tracking behavioral changes of individual 
students over time, as well as general behavioral changes 
in the educational system. Based on the results, we give 
suggestions for behavioral changes, in order to optimize the 
learning experience and improve performance. 
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1. INTRODUCTION + RELATED WORK 


How students behave in educational systems is an impor- 
tant topic in educational data mining. Knowledge of this 
behavior in an educational system can help us understand 
how students learn, and help guide the development for op- 
timal learning based on actual use. This behaviour can be 
understood both through an explicit study [5], or as in this 
paper through the automatically generated log data of the 
system. 


The analysis of log data is usually done as an unsupervised 
clustering of students [2, 3, 4, 7]. A popular approach is 
to extract action sequences and transform them into an ag- 
gregated representation using Markov models [4, 7]. The 
Markov chains can then be clustered by different methods. 
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Klingler et al. did student modeling with the use of ex- 
plicit Markov chains and the clustering with different dis- 
tance measures defined on the Markov chains [7]. Hansen 
et al. assumed the actions sequences to be generated by a 
mixture of Markov chains and used an heuristic algorithm 
to find the generating Markov chains [4]. Gelman et al. 
used non-negative matrix factorization to find clusters for 
different measures of activity aggregated in weekly periods 
during a MOOC course. These clusters are then matched 
from week to week by cosine similarity. 


Our work is similar to Gelman et al. [3] in that we also 
use Non-negative Matrix Factorization (NMF) to make a 
soft clustering at the student level in a given time period, 
however our clustering is only made once, and we are looking 
at primary school data over a vastly longer period of time, 
(2 years compared to 14 weeks). 


Our soft clustering by non-negative matrix factorization is 
based on log data from Clio Online.’ Clio Online is the 
largest provider of digital learning for all subjects in the 
Danish primary school (except mathematics), having 90% 
of all primary schools in Denmark as customers. 


Using NMF, we assume that the set of features chosen can 
be represented by a set of fewer underlying behaviors. These 
underlying behaviours would each be represented by a clus- 
ter in the non-negative matrix factorization. Each student 
will then get a number for each cluster in each time period 
representing how much of that underlying behavior he has 
shown in the given time period. Non-negativity gives the 
behaviors an additive structure, which is more natural than 
showing a negative amount of a given behavior. We reason 
that the soft clustering will show both the behaviors of in- 
dividual students, as well as how the behaviors change over 
time, both individually and on a system-wide level. 


In this paper, we will consider two main questions: a) how 
does student activity in the system affect performance, and 
b) how does student activity distribute between different lev- 
els of Bloom’s taxonomy in different subjects. Both ques- 
tions are important in regards to optimizing learning; the 
first in relation to performance, the latter in relation to uti- 
lization of all taxonomy levels. 


'This data is proprietary and not publicly available. 
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Figure 1: Number of students active in each period. 
Note that period 0 starts on 2015-01-08, while pe- 
riod 111 ends on 2017-03-01. The drops in activity 
occur due to vacation in Danish primary school, with 
the two large drops around periods 25 and 79 being 
due to the summer vacation. 


2. EXPERIMENTAL SETUP 


This section describes our experimental setup and methods. 
We start by describing our data and how it is preprocessed, 
and then move on to describing our clustering method. 


2.1 Data Preprocessing 

As mentioned, we consider log data generated in the Dan- 
ish online educational system Clio Online. The system is 
used in Danish primary schools and contains learning ob- 
jects across all Danish subjects (except mathematics), for 
instance texts, videos, sound clips and exercises. Further- 
more, the system includes a large number of quizzes, used 
for evaluating students. Students may use the system for 
self study, but they may also be assigned homework by their 
teacher. Our data covers 14,810 students. 


The raw data consists of logs detailing page accesses for in- 
dividual students in the system. For quizzes, the final score 
(between 0 and 1) and total time spent for the quiz is also 
available. In our preprocessing, we combine these log entries 
to sessions. Two consecutive entries are considered in the 
same session, if they have the same subject, and their times- 
tamps differ by less than some threshold. For our study, we 
choose this threshold to be 600 seconds, based on recom- 
mendations from Clio Online, who have a deeper knowledge 
of the content and flow of the system (e.g. expected time 
per page). Furthermore, quizzes are considered separate ses- 
sions. A total of 3 million sessions is obtained in this way. 


With the sessions defined, we consider student activity in 
activity periods, with a length of one week. The data spans 
a total of 112 activity periods, starting January 2015 and 
ending in March 2017. For each activity period, we add an 
entry for a student, if the student is active (accesses the 
system) within that period. The entry for the given student 
contains all sessions for that student, which starts within 
the activity period. We end up with approximately 677,000 
student entries across the 112 periods. Figure 1 shows the 


active number of students in each period. Note the drop 
in active students around periods 25 and 79; these drops in 
activity occur due to summer vacation. 


The final step of data preprocessing is the feature extrac- 
tion. For each activity period, a set of activity /performance 
related features are extracted. The features are chosen so 
as to answer the questions posed in the previous section. A 
complete overview of all features considered in our exper- 
iments is given in Table 1, including the maximum, mean 
and variance across all active students in all periods. Not 
all features are used for each experiment, see section 3. 


All features are aggregates over the activity period. Below 
follows a detailed description: 


fi describes the activity during the period of day, where 

Danish students are normally in school, while fz de- 

scribes the activity during non-school hours. 

e fs, fa and fs describe time spent doing exercises, read- 
ing texts and taking quizzes respectively. 

e fe, fz and fg describe time spent working with differ- 
ent topics: languages (Danish, English, German), soci- 
etal (social studies, history, etc.) and science (physics, 
biology, etc.), respectively. 

e fo is the average session length during the activity pe- 
riod. 

© fio is the average quiz score; this feature may be miss- 
ing, if a student takes no quizzes during an activity 
period, but our analysis methods can handle this, see 
section 2.2. 

e fii, fi2, fiz and fia describe the time spent doing exer- 

cises of different complexity, measured by their level in 

Bloom’s taxonomy. We regroup the levels of Bloom’s 

taxonomy into 4 levels: 


fi: Remember/Understand: Exercises involving 

reading and describing, e.g. "Read a map”. 

fiz Apply: Exercises involving application of previ- 
ously learned concepts, e.g. ”Practice adjectives”. 

3 Analyze/Evaluate: Exercises involving discus- 
sion, analysis and experimenting, e.g. ”Work with 
the poem”, ”Analyze the game”. 

4 Create: Exercises involving creation of a prod- 
uct, e.g. "Create a cartoon”, ”Write a story”. 


7 


ie 


Having extracted m features for each student in each period, 
we construct the matrix X € R"*™, where each of the n 
rows consists of the feature vector for an active student in 
a given activity period. Thus each student occurs several 
times in X; once for each period, where they are active. 


2.2 Soft Clustering using Non-negative Matrix 


Factorization 

We will utilize non-negative matrix factorization for our soft 
clustering. The use of NMF as a soft clustering technique 
has become popular in recent times [10], with applications 
within several fields, such as clustering of images and docu- 
ments [8, 13]. NMF has also seen success in the educational 
data mining community, for clustering tasks, as well as other 
tasks such as performance prediction [3, 12]. 
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2. 


fi 


Max Mean | Variance 


Hours doing exercises 
Hours reading texts 
Hours taking quizzes 


ABNOR WNHEH 


Hours between 8AM and 4PM 
Hours before 8AM and after 4PM 71.84 | 0.174 0.283 


31.85 | 0.940 0.862 


3.61 0.048 0.019 
7.73 | 0.344 0.148 
23.76 | 0.231 0.297 


Hours working with language subjects | 58.28 | 0.531 0.693 
Hours working with societal subjects 45.96 | 0.294 0.285 
Hours working with science subjects 
9 | Average session length in hours 

10 | Average quiz score (in 0, 1]) 


103.69 | 0.277 0.326 
7.91 0.268 0.027 
1.00 | 0.733 0.034 


11 | Hours working with Bloom level 1 2.83 0.016 0.006 
12 | Hours working with Bloom level 2 1.64 0.008 0.002 
13 | Hours working with Bloom level 3 1.51 0.014 0.003 
14 | Hours working with Bloom level 4 2.04 0.009 0.003 


Table 1: Overview of features. 
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Figure 2: The soft clustering given by NMF. 


NMF is a dimensionality reduction method, in which we are 
given a non-negative matrix X € R'*" and k €N, and wish 
to determine U € R**,V © R4*™, such that X ~ UV. 
More specifically, we search for U and V, such that the error 
||X — UV||r is minimized, where || - ||7 is the Frobenious 
norm. For our analysis, we need to be able to handle missing 
values in X. In this case the NMF problem is reformulated 
as the weighted non-negative matrix factorization, in which 
we are also given a binary weight matrix W € {0,1}"*™, 
where a 0 indicates missing data. Now, we wish to find U, V 
such that ||W © (K — UV) ||¢ is minimized?. 


U and V admits a soft k-clustering as shown in Figure 2; V 
describes the importance of each feature for each cluster (for 
instance, fi has high importance in C)), while U describes 
the membership of each data point to the different clusters 
(for instance, x3 is mostly in C1, while x4 is in both clusters). 


Note, that for NMF, we have X ~ UV = UIV = UA“!AV, 
where I is the k x k identity matrix and A is a k x k invert- 
ible matrix. This means that we may rescale U and V by 
this matrix, A, and its inverse. In our analysis, we use this 
to rescale V, such that all rows of V (the clusters) sum to 
one, thus making the clusters comparable, and membership 
of the different clusters easier interpretable. 


There exist several algorithms for obtaining the non-negative 
matrix factorization of X, for instance basic gradient de- 


2 denotes the Hadamard product (element-wise multipli- 
cation). 


scent, multiplicative update rules and alternating least squares; 
[1] gives a good overview in the non-weighted setting. Sev- 
eral of these algorithms have been adapted for the WNMF 
case, while approaches based on expectation maximization 
have also been proposed, see [6]. For our analysis, we will use 
the weighted version of the multiplicative update method, 
proposed by Lee and Seung [9]. 


The NMF algorithm given in [9], adopted to WNMF [6], is 


as follows: 


1. Initialize U and V. 
2. Repeatedly update U and V by the following rules: 


(W © X)V" 
UCU 
— © (wo (Uv) v7 
iE 
vevo UT (wex) 


UT (Wo (UV)) 


where division is done element-wise. 


The literature explores several ways of initializing U and V; 
in our case, we will simply use random initialization. The 
alternating optimization steps are applied until the decrease 
in error reaches below a set threshold. Finally, Lin has noted 
that the procedure described above may not converge to 
a stationary point, hence we modify the update rules as 
proposed by them [11]. Furthermore, since we in our case 
know all missing values of X to be bounded by a constant c, 
we modify the above procedure such that 0-weight values of 
UV that deviate above c are penalized, i.e. whenever a value 
(UV); with W;,; = 0 gets larger than c, we set X;; = c and 
W;; = 1, before the next update step. If (UV); decreases 
below c again, the weight is reset to 0. 


It remains to be seen, how we select the number of clusters, 
k. For each experiment, we construct clusterings with k = 
1,2,..., and stop when the decrease in error going from k 
clusters to k + 1 clusters is below some threshold, which 
depends on the initial error. As a consequence clusters will 
be uncorrelated on a student level, since otherwise we would 
pick a lower k. 
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Figure 3: The cluster matrix for the first experi- 
ment. 


3. EXPERIMENTS AND RESULTS 


In this section, we present two different experiments using 
the setup described above. In the first experiment, we inves- 
tigate the relation between activity, activity type, subject, 
time of day, average session length and performance. In 
the second experiment, we investigate the relation between 
complexities of exercises and subjects. 


3.1 Performance and Optimal Behavior 

In the first experiment, we investigate the relation between 
activity, activity type, subject, time of day, average session 
length and performance, i.e. we consider features fi, ..., fio. 
The features are extracted and k = 5 is selected, as described 
in section 2. We run the WNMF algorithm, and obtain the 
cluster matrix V as shown in Figure 3. From the figure, we 
can make several observations about the clusters: 


C; In this cluster, we find students mostly working with 
the science subjects (fg). These students seem to work 
mostly during school hours (f1). The students also 
seem to spent a lot of time reading (f4). 

C2 Students in this cluster spend a lot of time taking 
quizzes (fs). They will spend some time during school 
hours (f1) and some time working with language sub- 
jects (fe). Furthermore, students in this cluster seem 
to both have fairly long average session length and high 
performance (fg and fio). 

C3 In cluster C3, we see students working with societal 
subjects (f7). They work during school hours (f1) and 
spend time reading texts in the system (f4). 

C4 This cluster shows a relationship between being ac- 
tive in school (f1) and spending time in the language 
subjects (fs). Students in this cluster also spend time 
reading texts (f4) and doing some exercises (f3). 

Cs The most important feature for C's is fo, ie. the stu- 
dents in this cluster spend most time using the system 
during non-school hours. These students spent time in 
all subjects, but mostly languages (f¢), and they spent 
time taking quizzes (fs). 


From the clusters, we can see that the impact on perfor- 
mance from different behaviors depends on the subject. From 
cluster C2, we see that students working mostly with lan- 
guage subjects gain most performance from spending time 
taking quizzes and working during school hours, whereas 
students working mostly with societal (cluster C3) and sci- 
ence (cluster Ci) subjects gain most from reading texts, 


Fraction of students 
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Figure 4: The distribution of cluster membership 
for the first experiment. 
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Figure 5: The average cluster membership in each 
activity period for the first experiment. 


while working mostly during school hours. Note that cluster 
C4 indicates that students working with languages may also 
improve performance by reading texts, but to a lesser degree 
than students working in other subjects. Finally, Cs indi- 
cates that working mostly from home and primarily taking 
quizzes, does not improve performance. While C's indicates 
this for all subjects, the high importance of fa indicates 
that this most often occur for students working with lan- 
guages, confirming the observations from C2. Finally, it is 
also worth noticing, that there is a strong relation between 
performance and average session length (clusters C1, C2 and 
C3), indicating that students, who perform well, also have 
longer sessions on average. 


From the above discussion, it appears that the behavior in 
clusters C4 and C's are sub-optimal, when considering per- 
formance, while students gain more from being in C1, C2 or 
C3, i.e. by working during school hours, having longer ses- 
sions and taking quizzes (in the case of languages) or reading 
texts (in the case of societal or science subjects). 


Figure 4 describes the distribution of cluster membership 
across all students and all activity periods , i.e. the columns 
of the first interval [0,0.1) gives for each cluster the fraction 
of students with 0%-10% membership. We see, that we do 
indeed get a soft clustering, with students often belonging 
to more than one cluster. Only C3 seems to be the sin- 
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Figure 6: The cluster matrix for the second experi- 
ment. Note, that a logarithmic scale is used for this 
plot. 


gle dominant cluster of some students. From the figure, we 
also see that students are typically never exclusively in Cs, 
which is positive, as the behavior observed in that cluster 
was not very productive in terms of performance. Other 
than that, we generally observe that students seem to dis- 
tribute fairly well between the top four clusters, indicating 
most time spent during school hours and a varied use of both 
quizzes and texts across all subjects. 


Next, we analyze how the membership of different clusters 
change over time. Figure 5 plots the average membership 
for each period, i.e. the average of rows from U belonging to 
the given period. The first observation we make from Fig- 
ure 5, is that clusters C1, C2, C3 and C4 appear correlated 
at the system-wide level. This is due to these clusters being 
dependent on the general activity in the online system; most 
of the sudden drops occur at the same time as Danish school 
vacations, most notably the two larger drops around activity 
periods 25 and 79 (see Figure 1). C's seems to be relatively 
unaffected by the general activity, but this makes sense, as 
C's contains mostly students, who work outside school hours, 
and thus a lower membership is expected in that cluster in 
general, which is also the pattern we see in periods with no 
vacation. 


Looking at the general distribution between the different 
clusters, C3 and C4 seem to be the most dominant, indi- 
cating that most students are working with language and 
societal subjects and reading texts. Cluster C) (science sub- 
jects) is fairly constant in the non-vacation periods, and C2 
seems to increase starting period 80, indicating that more 
students spend time taking quizzes. Finally, as mentioned, 
Cs is the least active cluster across most periods. One gen- 
eral trend for the top four clusters seem to be an increase 
in activity during the 112 periods, indicating that students 
are spending more time in the system on average. 


3.2 Subject and Exercise Complexity 
In the second experiment we look at the relation between 
subjects and exercises grouped by Bloom’s taxonomy level, 
i.e. we consider features fe, f7, fg, fir, fia, fis, fia 


We expect three clusters, one for each of the subject classes, 
which will tell us how much each Bloom level is used within 
each subject class. Figure 6 shows the cluster matrix found. 
From Figure 6, we make the following observations: 
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Figure 7: The distribution of cluster membership 
for the second experiment. 
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Figure 8: The average cluster membership in each 
activity period for the second experiment. 


C; In the science subjects, only very little of the 3 higher 
levels are used, and almost none of reading and under- 
standing. 

C2 For societal subjects, students have only little activity 
in the first 2 levels, a lot in analyzing and evaluating, 
and very little activity in creation. 

C3 In languages, students have a tendency to read and 
understand a lot, and then distribute almost evenly 
on the 3 higher levels. 


This implies that if we want to attract students to use an 
online educational system for languages, focus should be on 
exercises with Bloom’s taxonomy level read and understand. 
For societal subjects the focus should be on exercises with 
analyzing and evaluating. For science we see no preference. 


From Figure 7, we see that the clustering has many high 
values which is most likely explained by having a teacher 
who uses the system exclusively in only one of the subjects, 
which we can see happens most often for languages. 


As we can see in Figure 8 all three clusters share similar cur- 
vature, which is partly explained by holidays. Especially the 
science and societal clusters behave seem highly correlated 
on a general level. We also see that in all three subjects, the 
average time spent during a week has gone from 15 minutes, 
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to 45 minutes for languages and 25 minutes for both societal 
subjects and sciences. A clear indication that teachers and 
students in Denmark are using online educational systems 
more, especially for languages. 


4. CONCLUSIONS AND FUTURE WORK 


Several points can be taken from our analysis. We have 
identified three optimal and two sub-optimal behaviors in 
relation to subject and performance. One notably conclu- 
sion is that students using the Clio Online system during 
non-school hours (at home) do not seem to gain any signifi- 
cant boost to performance. We also saw how taking quizzes 
seems to increase the performance of students in languages, 
more so than in other subjects, where reading texts are of 
more importance. This fits the intuition that skills such as 
grammar need to be trained, in order to be learned. We in- 
form how exercises are used depending both on their subject 
and their level in Bloom’s taxonomy. And lastly we see that 
the average amount of time spent in the system is increasing 
both generally and for the individual students in all subjects, 
but especially for students working with languages. Further- 
more, both experiments show how behaviors can have high 
correlation on a system-wide level, despite being uncorre- 
lated on the individual student level. While the change of 
behavior for individual students was not directly analyzed in 
this paper (due to privacy concerns), our method allows for 
tracking such individual changes, hopefully helping teachers 
encourage optimal student behavior, e.g. by recommend- 
ing training quizzes for students working with languages, or 
making sure that students are allowed more time to use the 
system in school. 


In our setting, the number of clusters is fixed. It may be 
interesting to use an adaptive clustering strategy instead, 
as done in [7], as one might expect clusters to change over 
time. In the future, it might also be interesting to include 
other features, that were not available to us at this time, for 
instance whether a text (or quiz) have been assigned by a 
teacher, or whether the student reads it by themselves. For 
this study, we also only had access to a limited amount of 
data; better and more reliable results might be obtained by 
including more data. 
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